Given our current inclination for using a stack machine to interpret IR1, it would be straightforward to layer a byte-code interpreter on top of this.
Interpreter:
Instead of having no interpreter, or a more-or-less conventional interpreter, or byte-code interpreter, how about directly executing IR1?
We run through the IR1 passes, possibly skipping optional ones, until we get through environment analysis. Then we run a post-pass that annotates IR1 with information about where values are kept, i.e. the stack slot.
We can lazily convert functions by having FUNCTION make an interpreted function object that holds the code (really a closure over the interpreter). The first time that we try to call the function, we do the conversion and processing. Also, we can easily keep track of which interpreted functions we have expanded macros in, so that macro redefinition automatically invalidates the old expansion, causing lazy reconversion.
Probably the interpreter will want to represent MVs by a recognizable structure that is always heap-allocated. This way, we can punt the stack issues involved in trying to spread MVs. So a continuation value can always be kept in a single cell.
The compiler can have some special frobs for making the interpreter efficient, such as a call operation that extracts arguments from the stack slots designated by a continuation list. Perhaps (values-mapcar fun . lists) <==> (values-list (mapcar fun . lists)) This would be used with MV-CALL.
This scheme seems to provide nearly all of the advantages of both the compiler and conventional interpretation. The only significant disadvantage with respect to a conventional interpreter is that there is the one-time overhead of conversion, but doing this lazily should make this quite acceptable.
With respect to a conventional interpreter, we have major advantages: + Full syntax checking: safety comparable to compiled code. + Semantics similar to compiled code due to code sharing. Similar diagnostic messages, etc. Reduction of error-prone code duplication. + Potential for full type checking according to declarations (would require running IR1 optimize?) + Simplifies debugger interface, since interpreted code can look more like compiled code: source paths, edit definition, etc.
For all non-run-time symbol annotations (anything other than SYMBOL-FUNCTION and SYMBOL-VALUE), we use the compiler's global database. MACRO-FUNCTION will use INFO, rather than vice-versa.
When doing the IR1 phases for the interpreter, we probably want to suppress optimizations that change user-visible function calls: – Don't do local call conversion of any named functions (even lexical ones). This is so that a call will appear on the stack that looks like the call in the original source. The keyword and optional argument transformations done by local call mangle things quite a bit. Also, note local-call converting prevents unreferenced arguments from being deleted, which is another non-obvious transformation. – Don't run source-transforms, IR1 transforms and IR1 optimizers. This way, TRACE and BACKTRACE will show calls with the original arguments, rather than the "optimized" form, etc. Also, for the interpreter it will actually be faster to call the original function (which is compiled) than to "inline expand" it. Also, this allows implementation-dependent transforms to expand into
There are some problems with stepping, due to our non-syntactic IR1 representation. The source path information is the key that makes this conceivable. We can skip over the stepping of a subform by quietly evaluating nodes whose source path lies within the form being skipped.
One problem with determining what value has been returned by a form. With a function call, it is theoretically possible to precisely determine this, since if we complete evaluation of the arguments, then we arrive at the Combination node whose value is synonymous with the value of the form. We can even detect this case, since the Node-Source will be EQ to the form. And we can also detect when we unwind out of the evaluation, since we will leave the form without having ever reached this node.
But with macros and special-forms, there is no node whose value is the value of the form, and no node whose source is the macro call or special form. We can still detect when we leave the form, but we can't be sure whether this was a normal evaluation result or an explicit RETURN-FROM.
But does this really matter? It seems that we can print the value returned (if any), then just print the next form to step. In the rare case where we did unwind, the user should be able to figure it out.
[We can look at this as a side-effect of CPS: there isn't any difference between a "normal" return and a non-local one.]
[Note that in any control transfer (normal or otherwise), the stepper may need to unwind out of an arbitrary number of levels of stepping. This is because a form in a TR position may yield its to a node arbitrarily far our.]
Another problem is with deciding what form is being stepped. When we start evaluating a node, we dive into code that is nested somewhere down inside that form. So we actually have to do a loop of asking questions before we do any evaluation. But what do we ask about?
If we ask about the outermost enclosing form that is a subform of the the last form that the user said to execute, then we might offer a form that isn't really evaluated, such as a LET binding list.
But once again, is this really a problem? It is certainly different from a conventional stepper, but a pretty good argument could be made that it is superior. Haven't you ever wanted to skip the evaluation of all the LET bindings, but not the body? Wouldn't it be useful to be able to skip the DO step forms?
All of this assumes that nobody ever wants to step through the guts of a macroexpansion. This seems reasonable, since steppers are for weenies, and weenies don't define macros (hence don't debug them). But there are probably some weenies who don't know that they shouldn't be writing macros.
We could handle this by finding the "source paths" in the expansion of each macro by sticking some special frob in the source path marking the place where the expansion happened. When we hit code again that is in the source, then we revert to the normal source path. Something along these lines might be a good idea anyway (for compiler error messages, for example).
The source path hack isn't guaranteed to work quite so well in generated code, though, since macros return stuff that isn't freshly consed. But we could probably arrange to win as long as any given expansion doesn't return two EQ forms.
It might be nice to have a command that skipped stepping of the form, but printed the results of each outermost enclosed evaluated subform, i.e. if you used this on the DO step-list, it would print the result of each new-value form. I think this is implementable. I guess what you would do is print each value delivered to a DEST whose source form is the current or an enclosing form. Along with the value, you would print the source form for the node that is computing the value.
The stepper can also have a "back" command that "unskips" or "unsteps". This would allow the evaluation of forms that are pure (modulo lexical variable setting) to be undone. This is useful, since in stepping it is common that you skip a form that you shouldn't have, or get confused and want to restart at some earlier point.
What we would do is remember the current node and the values of all local variables. heap before doing each step or skip action. We can then back up the state of all lexical variables and the "program counter". To make this work right with set closure variables, we would copy the cell's value, rather than the value cell itself.
[To be fair, note that this could easily be done with our current interpreter: the stepper could copy the environment alists.]
We can't back up the "program counter" when a control transfer leaves the current function, since this state is implicitly represented in the interpreter's state, and is discarded when we exit. We probably want to ask for confirmation before leaving the function to give users a chance to "unskip" the forms in a TR position.
Another question is whether the conventional stepper is really a good thing to imitate... How about an editor-based mouse-driven interface? Instead of "skipping" and "stepping", you would just designate the next form that you wanted to stop at. Instead of displaying return values, you replace the source text with the printed representation of the value.
It would show the "program counter" by highlighting the *innermost* form that we are about to evaluate, i.e. the source form for the node that we are stopped at. It would probably also be useful to display the start of the form that was used to designate the next stopping point, although I guess this could be implied by the mouse position.
Such an interface would be a little harder to implement than a dumb stepper, but it would be much easier to use. [It would be impossible for an evalhook stepper to do this.]
Note: compiler, not a general syntax for accessing "sub-primitives". It's main use is in implementation-dependent compiler transforms. It saves us the effort of defining a "phony function" (that is not really defined), and also allows direct communication with the code generator through codegen-info arguments.
Some primitives may be exported from the VM so that make it explicit that an escape routine or interpreter stub is assuming an operation is implemented by the compiler.